Querying and Ranking XML Documents Based on Data Synopses
نویسندگان
چکیده
There is an increasing interest in recent years for querying and ranking XML documents. In this paper, we present a new framework for querying and ranking schema-less XML documents based on concise summaries of their structural and textual content. We introduce a novel data synopsis structure to summarize the textual content of an XML document for efficient indexing. More importantly, we extend the traditional vector space model to effectively rank XML documents over the proposed data synopses. We conduct extensive experiments over XML benchmark data to demonstrate the advantages of the indexing scheme and the effectiveness of our ranking scheme. We also compare our framework with Lucene to demonstrate our extended TF*IDF scoring function is effective.
منابع مشابه
Locating XML Documents Using Content and Structure Synopses
In this paper, we present a novel framework for locating schema-less XML documents based on concise data synopses extracted from the documents. We introduce two novel data synopses, content synopsis and positional filter, to summarize the text data in an XML document for the query evaluation. These two data synopses correlate textual with positional information and consider the containment rela...
متن کاملA synopsis based approach for XML fast approximate querying
XML was born to represent, exchange and publish information on the Web, but now it has spread in many other applications. Due to this success, the W3C has proposed a new query language, XQuery, specifically designed to query XML data. XQuery allows to obtain exact answers to queries; however when applied to large XML repositories or warehouses, such precise queries may require high response tim...
متن کاملIndexing and Searching XML Documents Based on Content and Structure Synopses
We present a novel framework for indexing and searching schema-less XML documents based on concise summaries of their structural and textual content. Our search query language is XPath extended with full-text search. We introduce two novel data synopsis structures that correlate textual with positional information in an XML document and improves query precision. In addition, we present a two-ph...
متن کاملWeb Retrieval of XML Documents: Practice and Challenges
Web is characterized by a huge amount of very heterogeneous data sources, that differ both in media support and format representation. In this scenario, there is the need of an integrating approach for querying heterogeneous Web documents. To this purpose, XML can play an important role since it is becoming a standard for data representation and exchange over the Web. Due to its flexibility, XM...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- JDIM
دوره 9 شماره
صفحات -
تاریخ انتشار 2011